CLASSIFYING CLUSTERING SCHEMES 

GUNNAR CARLSSON AND FACUNDO MEMOLI 



Abstract. Many clustering schemes are denned by optimizing an objective function de- 
fined on the partitions of the underlying set of a finite metric space. In this paper, we 
construct a framework for studying what happens when we instead impose various struc- 
tural conditions on the clustering schemes, under the general heading of functoriality. 

Functoriality refers to the idea that one should be able to compare the results of clustering 
algorithms as one varies the data set, for example by adding points or by applying functions 
to it. We show that within this framework, one can prove a theorems analogous to one of J. 
Klcinberg [Kl e02| . in which for example one obtains an existence and uniqueness theorem 
instead of a non-existence result. 

We obtain a full classification of all clustering schemes satisfying a condition we refer 
to as excisiveness. The classification can be changed by varying the notion of maps of 
finite metric spaces. The conditions occur naturally when one considers clustering as the 
statistical version of the geometric notion of connected components. By varying the degree 
of functoriality that one requires from the schemes it is possible to construct richer families 
of clustering schemes that exhibit sensitivity to density. 

1. Introduction 

Clustering techniques play a very central role in various parts of data analysis. This type 
of methods can give important clues to the structure of data sets, and therefore suggest 
results and hypotheses in the underlying science. There are many interesting methods of 
clustering available, which have been applied to good effect in dealing with many datasets 
of interest, and are regarded as important methods in exploratory data analysis. 

Most methods begin with a data set equipped with a notion of distance or metric. Starting 
with this input, a method might select a partition of the data set as one which optimizes a 
choice of an objective function. Other methods such as single, average, and complete linkage 
define clusterings, or rather nested families of clusterings, using a linkage function defined 
between clusters. 

Desirable properties of clustering algorithms come from practitioners who have intuitive 
notions of what is a good clustering: "they know it when they see it" . This is of course not 
satisfactory and a theoretical understanding needs to be developed. However, one thing this 
intuition reflects is the fact that density needs to be incorporated in the clustering proce- 
dures. Single linkage clustering, a procedure that enjoys several nice theoretical properties, 
is notorious for its insensitivity to density, which is manifested in the so called chaining effect 
|LW67j . 

Other methods such as average linkage, complete linkage |JD88[ Chapter 3] and /c-means 
share the property that they exhibit some sort of sensitivity to density, but are unstable 
in a sense which has been made theoretically precise |CM10aj and are therefore not well 
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supported by theory. We believe that this disconnect between theory and practice should 
not exist, and in particular, in |CM10b] we have constructed a theoretically sound framework 
that incorporates density via the use of 2-dimensional persistence ideas. 

Given that there is relatively little theory surrounding clustering methods, the subject is 
regarded by many as consisting of a collection of ad hoc methods for which it is difficult to 
generate a coherent picture |GvLW09j. One exception to this rule is a paper of J. Kleinberg 
|Kle02] , in which he proves a theorem which demonstrates the impossibility of constructing a 
clustering algorithm with three seemingly plausible properties. Kleinberg regards a clustering 
scheme as a map € that assigns each finite metric space (X, dx) one of its partitions. His 
"axioms" are 

• Scale invariance: <t(X, d x ) = £(X, X ■ dx) for all A > 0; 

• Surjectivity: for all partitions P of X there exists a metric dx on X with <£(X, dx) = 
P; and 

• Consistency: upon reducing distances between points in the same cluster of X (as 
produced by <£), and increasing distances between points in different clusters, the 
result of applying the method to the new metric does not change. 

Theorem 1.1 ([Kle02]). There exists no clustering algorithm that simultaneously satisfies 
scale invariance, surjectivity and consistency. 

Kleinberg also points out that his results shed light on the trade-offs one has to make in 
choosing clustering algorithms. In this paper, we produce a variation on this theme, which 
we believe also has implications for how one thinks about and applies clustering algorithms. 

In our earlier paper |CM10a] , in the context of hierarchical methods, we have established 
a variant on Kleinberg's theorem, in which instead of impossibility, one actually proves an 
existence and uniqueness result, again assuming three plausible properties, slightly different 
from Kleinberg's. In the present paper, we make one more step towards understanding 
the theoretical foundations of clustering methods by expanding on the theme of [CMlOaJ 
and developing a context in which one can obtain much richer classification results, which 
include versions of a number of familiar methods, including analogues of the clique clustering 
methods familiar in network and graph problems. We now describe our context and results. 

1.1. Overview of our methods and results. We regard clustering as the "statistical 
version" of the geometric notion of constructing the connected components of a topological 
space. Recall (see, e.g. |Mun75] ) that the path components of a topological space X are 
the equivalence classes of points in the space under the equivalence relation ~ pa th, where, 
for x, x 1 e X, we have x ~ pa th V if and only if there is a continuous map f : [0, 1] — > X 
so that (/?(0) = x and <^(1) = x' . In other words, two points in X are in the same path 
component if they are connected by a continuous path in X. We let 7r (X) denote the set of 
connected components of the space X. A crucial observation about tt is that it is a functor, 
i.e. that not only does one construct a set (of components) associated to every topological 
space, but additionally, to every continuous map / : X —*■ Y one associates a map of sets 
ttq(X) — > n (Y). We take the point of view that this functoriality is a key property to 
export from topological spaces to the finite metric spaces which form the input to clustering 
algorithms. Functoriality has been a very useful framework for the discussion of a variety of 
problems within mathematics over the last few decades, see [ML98J. 

Remark 1.1. The reader may not be familiar with the notion of functoriality, as presented 
in this paper. The fundamental idea behind functoriality is that it is crucial to study the 
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maps or transformations between mathematical objects as well as the objects themselves. 
Interpreted at this level of generality, one can readily see the importance of this notion via a 
number of examples. 

(1) Fourier analysis: The key observation of Fourier analysis in its various forms is the 
the study of the symmetry groups of a space allows one to construct very useful and 
conceptually important bases for spaces of functions on the space. So, the standard 
Fourier basis for complex valued functions on the circle is constructed by selecting 
functions which satisfy certain transformation laws under the rotational symmetries. 
This is of course also true for, e.g., the spherical harmonic basis for the functions on 
R 3 . Functoriality can be regarded as a transformation law. 

(2) Normal forms for matrices: The analysis of the conjugation action of groups 
of invertible matrices on themselves gives rise to normal forms of matrices, such 
as Jordan normal form. This means that the conjugation symmetries on spaces of 
matrices allows one to understand their organization 

(3) Algebraic topology: In algebraic topology, one associates to topological spaces al- 
gebraic objects called homology groups. It is critical for applications of these groups, 
and for computing them, that they obey a transformation law for any continuous map 
of spaces. This transformation law turns out to be very powerful, in that it is a 
key factor in making a small axiomatic description of the homology groups. Just as 
transformation laws for symmetries determine the Fourier basis, so the functoriality 
determines the homology groups. 

(4) Galois theory: In Galois theory, groups of symmetries allow one to describe the 
sets of solutions of algebraic equations over fields, such as the rational numbers. The 
observation that equations were associated to symmetry groups revolutionized number 
theory, and to this day the study of various kinds of transformation laws in modular 
forms continues to have a powerful impact on the field. 

The point of the present paper is to illustrate that functoriality allows a very useful 
framework for classifying large families of clustering schemes. This framework will include 
clustering schemes which are sensitive to effects by density within the domain data set. 
Moreover, the functoriality of the construction of path connected components is a key feature 
in all aspects of topology, and the functoriality of clustering constructions should play a 
similar role in the computational topology of finite data sets. 

Then, we require that a standarco clustering algorithm £ be a rule which assigns to every 
finite metric space X a set of clusters £(X), subject to additional the requirement that every 
map / : X -> y induce a map of sets €(f) : <t(X) €(Y). 

The question is what the replacement for continuous maps (i.e. morphisms of topological 
spaces) should be, and we find that there are many choices. We consider three nested 
classes of maps, namely isometries (called iso), distance non- increasing maps (called gen, for 
general), and distance non-increasing maps which are injections on the sets of points (called 
inj), with the hierarchy being given by iso a inj cz gen. We find that functoriality with 
respect to gen is quite restrictive, in that it singles out single linkage clustering as the only 
functor which satisfies the gen functoriality requirement. On the other hand, functoriality 
with respect to iso appears not to be restrictive enough, in that it permits the specification 
of an arbitrary clustering on every isometry class of finite metric spaces. 
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One of the features of our classification is the proof that for the gen and inj cases, a 
certain implicitly defined property of clustering schemes called excisiveness, which amounts 
to requiring that the clustering method is idempotent on each of the blocks of the partition 
it generates, is equivalent to the existence of a certain kind of explicit generative model for 
the scheme. 

Mathematical results which prove the existence of a generative model given that some 
externally defined properties are satisfied are very powerful in mathematics. For example, 
the result that a system satisfying the axioms for a vector space over a field F admits a basis 
is very useful, as is the fact that a graph with no odd order cycles is bipartite. 

The equivalence between excisiveness and the existence of a generative model for the 
clustering scheme has the interpretation that excisive schemes are parametrized by sets of 
"test metric spaces", which are those connected into a single cluster by the scheme. The 
method associated to such a family is then defined using a criterion which asserts that two 
points x and x' in a metric space X belong to the same cluster if there is a sequence of points 
{xq, x\, . . . , x n } with xq = x, x n = x', and so that for each pair Xi, Xj+i there is a morphism 
in gen or inj from one of the test metric spaces into X with x,- t and x i+ i in its image. 

A key feature of the choice of morphisms in inj is that (since they are injective) this 
permits one to take account of density in a certain sense. If one considers functoriality with 
respect to gen, one finds that the morphisms might collapse a large dense collection of points 
into a single point, which makes it impossible to take density into account in building the 
clusters. Our classification of functors based on inj includes analogues of "clique clustering" 
[PD FV05] and the DBSCAN algorithm [EpKSX96j, and such methods clearly have the effect 
of taking a version of density into account. 

Whereas our classification of gen functorial schemes yields that single linkage is the unique 
possible scheme, which turns out to be excisive, we find large classes of non-excisive clustering 
schemes in inj. Our construction of such families relies on a study of the metric invariants 
that are well behaved under inj morphisms. Finally, it turns out that all schemes in inj that 
admit a finite generative model arise as the composition of single linkage clustering with an 
application that changes the underlying metric of the input metric space. This result has 
clear practical consequences which we discuss. 

We believe that functoriality, incorporating as it does the idea that clustering should not 
operate only on finite metric spaces as isolated objects, but rather on the metric space 
together with other metric spaces which are related to it via morphisms, is a very desirable 
property. We can point to its success within topology as an extremely powerful tool which 
pins down many constructions as the unique solution to existence problems, and which is the 
key to most applications of topological methods. It is also true that functoriality is necessary 
in order to extend the methods of clustering to higher order connectivity information with 
computational topology of finite data sets |Car09j . 

In this paper we demonstrate that the use of restricted notions of functoriality allows us 
to study clustering schemes which take proxies for density into account. This is important 
because it is clear that some of the value of alternate schemes such as average and complete 
linkage (which are not functorial) is due precisely to their ability to do this. Therefore, 
some of the methods we study will not suffer from the problems inherent in single linkage 
clustering, such as chaining |JS71[ Section 7.4] |CM10bj . 

1.2. Organization of the paper. In Section [2] we discuss basic background concepts that 
we use in our presentation. Section [3] introduces the concept of a category and gives many 
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examples and constructions we use in the paper. The concept of a functor and functorial- 
ity are discussed in Section |H We present our main characterization results for standard 
clustering methods in Section [61 We study hierarchical methods in Section Concluding 
remarks are given in Section [5J 



For a finite set X we denote by V{X) the set of all partitions of X. For each k e N, M.k 
denotes the collection of all finite metric spaces with cardinality k and Ai = (J feeN -Mfc denotes 
the collection of all finite metric spaces. For (X, dx) e M. let sep(X, dx) '■= min^a;/ dx(x, x') 
be the separation of X, diam(X, dx) '■= max^^ex dx(x, x') be the diameter of X, and for 
A > we denote by A ■ X the metric space (X, A • dx)- For 5 > and ^ 2 we let Afc(<5) 
denote the metric space with & points whose inter-point distances equal 8. 

Definition 2.1. On (X, ) e .M, /or each 8 ^ 0, we define the equivalence relation ~g, 
where x ~$ x' if and only if there is a sequence Xq,Xi, . . . ,Xk e X so that x = x,Xf~ = x' , 
and dx(xi, x i+ i) < 8 for all i. We denote by [x]$ the equivalence class of x e X under 

The following lemma follows directly from the definition of ~$. 

Lemma 2.1. Let (X, dx) e M. and 5^0. Then, [x]$ # [x']$ for some x, x' e X if and only 



Let X be any finite set and Wx :IxI-> M + be a symmetric function s.t. Wx(x, x) = 
for all x e X. Then, the maximal sub-dominant ultrametric relative to Wx is given by 



This construction and its correctness are standard |NS08| §6.4.3]. 

We define objects which will encode the notion of "multiscale" or "multiresolution" par- 
titions. 

Definition 2.2. A persistent set is a pair (X,8x), where X is a finite set, and 6x is a 
function from the non-negative real line [0, +co) to V(X) so that the following properties 
hold. 

(1) Ifr^s, then 9x(r) refines 9x(s). 

(2) For any r, there is a number e > so that 8x(r') = 0x( r ) f or a H r ' G [r, r + e]. 

If in addition there exists t > s.t. 9x{t) consists of the single block partition for all r ^ t, 
then we say that (X,9x) is a dendrogram^ 

Example 2.1. Let (X,dx) be a finite metric space. Then we can associate to (X,dx) the 
persistent set whose underlying set is X , and where for each r ^ 0, blocks of the partition 
Qx{r) consist of the equivalence classes under the equivalence relation ~ r . 

Example 2.2. Here we consider the family of Agglomerative Hierarchical clustering tech- 
niques, [JD88J. We (re) define these by the recursive procedure described next. Let X = 
{xi, . . . ,x n } and let C denote a family of linkage functions, i.e. functions which one uses 



2. Preliminaries 



dx(a, a') > 5, for all a e [x]$ and a' e [x']s. 




m&xWx(xi, Xi + i), {xi} i=0 s.t. x = x and x^ = x' 



In the paper we will be using the word dendrogram to refer both to the object defined here and to the 
standard graphical representation of them. 
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for defining the distance between two clusters. Fix I e C For each R > consider the equiv- 
alence relation on blocks of a partition IT e V{X), given by B ~i t u B' if and only if there 
is a sequence of blocks B = B\, . . . , B s = B' in II with l(Bk, Bk+i) ^ R for k = 1, . . . , s — 1. 
Consider the sequences r l5 r 2 , . . . e [0, co) and 0i, 02, . . . g given by 0i := {xi, . . . , x n ) 

and for % Js 1, O i+1 = 0j/ ~^ r . where Ti := min{Z(B, B'), i3, i?' e i; £> # £?'}. Finally, we 
define 9 l : [0, go) — > V(X) by r ^ 8 l (r) := 0j( r ) where i(r) := max{z|rj ^ r}. 
Standard choices for I are single linkage: 



complete linkage 
and average linkage: 



l(B,B') = min min dx(x, x'): 

V ' xeB x'eB 1 



l(B,B') = maxmaxdj((a;,i') 

xeB x'eB' 



l(B B') = S ^ JxeB ^^'eg' ^ x x '} 



\B\ ■ \B'\ 

It is easily verified that the notion discussed in Example \2.1\ is equivalent to Q l when I is the 
single linkage function. Note that, unlike the usual definition of agglomerative hierarchical 
clustering, at each step of the inductive definition we allow for more than two clusters to be 
merged. 



3. Categories 

3.1. Definitions and Examples. In this section, we will give a brief description of the 
theory of categories and functors, which will be the framework in which we state the con- 
straints that we require of our clustering algorithms. An excellent reference for these ideas 
is [ML98j . 

Categories are useful mathematical constructs that encode the nature of certain objects of 
interest together with a set of admissible maps between them. This formalism is extremely 
useful for studying classes of mathematical objects which share a common structure, such 
as sets, groups, vector spaces, or topological spaces. The definition is as follows. 

Definition 3.1. A category C_ consists of: 

• A collection of objects ob(C) (e.g. sets, groups, vector spaces, etc.) 

• For each pair of objects X, Y e ob(C) ; a set 

Morc^(X, Y), the morphisms from X to Y (e.g. maps of sets from X to Y, ho- 
momorphisms of groups from X to Y , linear transformations from X to Y , etc. 
respectively) 

• Composition operations: 

o : Moic_(X, Y) x Mor^Y, Z) — > Mor^X, Z), corresponding to composition of set 
maps, group homomorphisms, linear transformations, etc. 

• For each object X e C_, a distinguished element idx g Mor^X, X), called the iden- 
tity morphism. 

The composition is assumed to be associative in the obvious sense, and for any f e Motc_(X, Y), 
it is assumed that idy° f = f and f° idx = f ■ 

Example 3.1 (The category of sets). Denote by Sets the category whose objects are all sets, 
and the morphisms between two sets A and B are all the functions from A to B. 
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A morphism / : X — > Y which has a two sided inverse g : Y — > X, so that f ° g = idy and 
g° f = idx, is called an isomorphism. Two objects which are isomorphic are intuitively 
thought of as "structurally indistinguishable" in the sense that they are identical except for 
naming or choice of coordinates. For example, in the category of sets, the sets {1,2,3} and 
{A, B, C) are isomorphic, since they are identical except for the choice made in labelling the 
elements. 

Example 3.2. Here we consider four very simple categories, that we are going to refer to 
as 0) 1) 2 and 3 respectively. 

• The category has ob(0) = and all the conditions in the definition above are 
trivially satisfied. 

• Consider the category 1 with exactly one object A and one morphism: Mori (A, A) = 
f . It follows that f must be the identity morphism id a- This is represented graphically 
as follows: 



• The category 2 has exactly two objects A and B and three morphisms: the identities 
from A to A and from B to B and exactly one morphism in Moi^A, B): 

A / *- B ^B 

• Finally, the category 3 has exactly three objects A, B and C and six morphisms: 
the identities from A to A, from B to B and C to C , and three more morphisms, 
Mor^A, B) = f, Mora(£, C) = g and Mor^A, C) = h: 



Now, note that in order to satisfy the composition rule one must have h = g o /. 

Example 3.3. A more surprising example of a category is given by the following construc- 
tion. A monoid (M, *, e) consists of a set M equipped with a binary operation * : MxM — > M 
s.t. (A*B)*C = A*(B*C)forallA,B,CeMandanelementeeMs.t. e*A = A*e = A 
for all A e M. A monoid can be made into a category M_ with a single object O and 
MorM_(0, O) = M and ido being represented by e. The composition of morphisms is induced 
by the operation *. 

Example 3.4. Let E denote the extended positive real line [0, +oo]. We construct the cat- 
egory E with ob(E) = E and Morj^a, b) = {(a, b)} if a ^ b, and Mor^a, b) = 0. The 
composition is given by (a,b)° (b,c) = (a, c). 

Now we discuss the core constructions for this paper. 

Definition 3.2 (C, a category of outputs of standard clustering schemes). Let Y be 

a finite set, Py e V(Y), and f : X — > Y be a set map. We define f*(Py) t° be the partition 
of X whose blocks are the sets where B ranges over the blocks of Py. We construct 

the category C of outputs of standard clustering algorithms with ob(C) equal to all possible 
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pairs (X,Px) where X is a finite set and Px is a partition of X : Px £ V{X). For objects 
(X, P x ) and (Y, P Y ) one sets More ((X, P x ), (V, P Y )) to be the set of all maps f : X -> Y 
with the property that Px is a refinement of f*(Py). 

Example 3.5. Let X be any finite set, Y = {a,b} a set with two elements, and Px a 
partition of X. Assume first that Py = {{a}, {b}} and let f : X —> Y be any map. Then, in 
order for f to be a morphism in Moic((X, Px), (Y, Py)) it is necessary that x and x' be in 
different blocks of Px whenever f(x) # f(x'). Assume now that Py = {a, b} and g : Y —> X . 
Then, the condition that g e Morc_((Y, Py), (X, Px)) requires that g(a) and g(b) be in the 
same block of Px- 

We will also construct a category of persistent sets, which will constitute the output of 
hierarchical clustering functors. 

Definition 3.3 (V_, a category of outputs of hierarchical clustering schemes). Let 

(X,9x), (Y,6y) be persistent sets. A map of sets f : X —> Y is said to be persistence 
preserving if for each r e M, we have that 9x(r) is a refinement of f*(9 Y (r)). We define 
a category V whose objects are persistent sets, and where Mor-p((X, Ox), (Y, 9 Y )) consists of 
the set maps from X to Y which are persistence preserving. 

It is easily verified that the composite of persistence preserving maps is persistence pre- 
serving, and that any identity map is persistence preserving. A simple example is shown in 
Figure [TJ 




f(A) 


= A' 


f(B) 


= B' 


f(C) 


= C 




{{A},{B},{C}} 



{{a;b'},{C}} 



Figure 1. Two persistent sets (X, 9 X ) and (Y,9 Y ) represented by their den- 
drograms. On the left one defined in the set X = {A,B,C} and on the 
right one defined on the set Y = {A', B', C'}. Consider the given set map 
/ : X — > Y. Then we see that / is persistence preserving since for each 
r ^ 0, the partition 9x(r) is a refinement of f*(6y(r)). Indeed, there are 
three interesting ranges of values of r. Pick for example r like in the or- 
ange shaded area: r e [1,2). Then 9y(r) = {{A' ,B'},{C'}} and hence 
r(9 Y (r)) = {/ -1 ({j4', B'}), {/ _1 (C")}} = {{A,B},{C}} which is indeed re- 
fined by 9x(r) = {{A}, {B}, {C}}. One proceeds similarly for the other two 
cases. 



3.2. Three categories of finite metric spaces. We will describe three categories Ai lso , 
■A/f nj , and A4 9en , whose collections of objects will all consist of the collection of finite metric 
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spaces M.. For (X, dx) and (Y, dy) in A4, a map / : X — > Y is said to be distance non 
increasing if for all x,x' e X, we have dy(f(x),f(x')) ^ dx(x,x'). It is easy to check 
that composition of distance non- increasing maps are also distance non-increasing, and it is 
also clear that idx is always distance non-increasing. We therefore have the category A4 9en , 
whose objects are finite metric spaces, and so that for any objects X and Y, Morten (X, Y) 
is the set of distance non-increasing maps from X to Y, cf. |Isb64] for another use of this class 
of maps. It is clear that compositions of injective maps are injective, and that all identity 
maps are injective, so we have the new category Ad 1 " 3 , in which Mor M i n] (X,Y) consists of 
the injective distance non-increasing maps. Finally, if (X, dx) and (Y,dy) are finite 
metric spaces, / : X —> Y is an isometry if / is bijective and dy(f(x), f(x')) = dx(%, %') for 
all x and x'. It is clear that as above, one can form a category ]\A lso whose objects are finite 
metric spaces and whose morphisms are the isometries. Furthermore, one has inclusions 

(3-2) M iso c M mj c M 9en 

of subcategories (defined as in [ML98j ). Note that although the inclusions are bijections 
on object sets, they are proper inclusions on morphism sets, i.e. in general they are not 
surjective. 

Remark 3.1. The category Ai 9en is special in that for any pair of finite metric spaces X 
and Y, Morten (X, Y) ^ 0. Indeed, pick y e Y and define <fi '■ X —> Y by x \-> y for 
all x e X. Clearly, <f> e Morten (X, Y) . This is not the case for J\A m3 since in order for 
Mor M inj(X,Y) ^ to hold it is necessary (but not sufficient in general) that \Y\ |X|. 

4. Functors and functoriality. 

Next we introduce the key concept in our discussion, that of a functor. We give the formal 
definition first, and several examples will appear as different constructions that we use in 
the paper. 

Definition 4.1 (Functor). Let C_ and D be categories. Then a functor from C_ to D_ consists 
of: 

• A map of sets F : ob(C) — >• ob(D). 

• For every pair of objects X,Y e C_ a map of sets $(X, Y) : Morc-(X, Y) — > Mor o_{FX, FY) 
so that 

(1) $(X,X)(idx) = id F (x) for all X e ob(C), and 

(2) Q(X,Z)(g°f)=$(Y,Z)(g)°$(X,Y)(f)forallfe More (X, Y) and g e More (Y, Z) . 
Given a category C_, an endofunctor on C_ is any functor F : C — >■ C_. 

Remark 4.1. In the interest of clarity, we will always refer to the pair (F, $) with a single 
letter F. See diagram h4~4§ below for an example. 

Example 4.1. (Forgetful functors) When one has two categories C_ and D_, where the 
objects in C_ are objects in D equipped with some additional structure and the morphisms in 
C_ are simply the morphisms in D which preserve that structure, then we obtain the 'forgetful 
functor" from C_ to D_, which carries the object in C_ to the same object in C_, but regarded 
without the additional structure. For example, a group can be regarded as a set with the 
additional structure of multiplication and inverse maps, and the group homomorphisms are 
simply the set maps which respect that structure. Accordingly, we have the functor from 
the category of groups to the category of sets which 'forgets the multiplication and inverse". 
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Similarly, we have the forgetful functor from C_ to the category of sets, which forgets the 
presence of Px in the output set (X,Px)- 

Example 4.2. The inclusions M tso c M mj c M 9en are both functors. 

Example 4.3 (Scaling functor). For any A > we define an endofunctor a\ : A4 9en — > Jv[ 9en 
on objects by o~\(X,dx) = (X,X ■ dx) and on morphisms by cr\(/) = /. One easily verifies 
that if f satisfies the conditions for being a morphism in A4 9en from (X, dx) to (Y, dy), then 
it readily satisfies the conditions of being a morphism from (X, A • dx) to (Y, A • dy)- Clearly, 
a x can also be regarded as an endofunctor in Ai lso and J\4 mj . 

Similarly, we define a functor s\\V_^V_by setting S\(X, 6x) = (X, 9 X ), where x {r) = 

Mi)- 

4.1. Clustering algorithms as functors. The notion of categories, functors and functo- 
riality provide useful framework for studying algorithms. One first defines a class of input 
objects X and a class of output objects O. Moreover, one associates to each of these classes a 
class of natural maps, the morphisms, between objects. For the problem of HC for example, 
the input class is the set of finite metric spaces and the output class is that of dendrograms. 
An algorithm is to be regarded as a functor between a category of input objects and a 
category of output objects. 

An algorithm will therefore be a procedure that assigns to each / elan output Oj e O 
with the further property that it respects relations between objects in the following sense. 
Assume 1,1' el such that there is natural map /:/—>•/'. Then, the algorithm has to have 
the property that the relation between Oj and Op has to be represented by a natural map 
for output objects. 

Example 4.4. Assume that I is such that Morx(X, Y) = for all X,Y e I with X Y . 
In this case, since there are no morphisms between input objects any functor 21 : X —> Q_ can 
be specified arbitrarily on each X e Q_. 

We view any given clustering scheme as a procedure which takes as input a finite metric 
space (X, dx), and delivers as output either an object in C or V: 

• Standard clustering: a pair (X, Px) where Px is a partition of X. Such a pair is 
an object in the category C. 

• Hierarchical clustering: a pair (X,9x) where 9x is a persistent set over X. Such 
a pair is an object in the category V_. 

The concept of functoriality refers to the additional condition that the clustering pro- 
cedure should map a pair of input objects into a pair of output objects in a manner which 
is consistent with respect to the morphisms attached to the input and output spaces. When 
this happens, we say that the clustering scheme is functorial. This notion of consistency 
is made precise in Definition 14.11 and described by diagram ( !4-4p . Let A4 stand for any of 
M? en , M inj or M iso . 

According to Definition I4.1[ in order to view a standard clustering scheme as a functor 
€ : AA — > C we need to specify: 

(1) how it maps objects of A4 (finite metric spaces) into objects of C, and 

(2) how a morphism / : (X, dx) — » (Y, dy) between two objects (X, dx) and (Y, dy) in 
the input category «M induces a map in the output category C, see diagram ( 14 — 4D . 
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(4-3) 



(X,d x ) 



(Y,d 



Yj 



(X, P X ) M» (Y, Py ) 

Similarly, in order to view a hierarchical clustering scheme as a functor fj : M_ —> V_ we 
need to specify: 

(1) how it maps objects of A4 (finite metric spaces) into objects of V, and 

(2) how a morphism / : (X,dx) — > (Y,dy) between two objects (X,dx) and (Y,d Y ) in 
the input category M_ induces a map in the output category V, see diagram (l4-4p . 



(4-4) (X,d x )- L +(Y,d Y ) 



(X,6 X )^(Y,6 Y ) 

Precise constructions will be discussed in Sections [6] and [7J where we study different 
clustering algorithms, both standard or flat, and hierarchical, using the idea of functoriality. 

We have 3 possible "input" categories ordered by inclusion (13-21) . The idea is that study- 
ing functoriality over a larger category will be more stringent /demanding than requiring 
functoriality over a smaller one. We will consider different clustering algorithms and study 
whether they are functor ial over our choice of the input category. The least demanding one, 
M_ lso basically enforces that clustering schemes are not dependent on the way points are 
labeled. We will prove uniqueness results for functoriality over the most stringent category 
A4 gen , and finally we study how relaxing the conditions imposed by the morphisms in ]\A 9en ', 
namely, by restricting ourselves to the smaller but intermediate category A^ mon , we permit 
more functorial clustering algorithms. 

5. Invariants of metric spaces 

Now we consider maps from a metric space into the ordered extended positive real line 
E that behave well under morphisms of the categories Ai lso , Ai inj , and A4 9en , respectively. 
Such maps will be used later on for defining certain clustering functors, and the richness 
of such maps, or lack thereof, signals the degree to which these clustering functors exhibit 
different interesting behaviors. 

Definition 5.1. Let Ai be any of Ai 9en , M. m ^ or JvV so , then an invariant is any functor 
3 : M -> E. 

Recall that this means that 3(X) ^ 3(Y) whenever Mor^ (X, Y) ^= 0. Diagrammatically, 
this is the behavior we seek: 



(5-5) 



(X,d 



x, 



(Y,d, 



3(X) 



3(Y) 
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5.1. The case of M. lso . The collection of invariants in JvV so is the largest possible among 
M lso , M mj and M 9en , and this could be seen as a consequence of the fact that M lso is the 
category with the fewest morphisms cf. (13-20 . In this case all M. ^"-functorial invariants 

lyl \X\ 

are given in the following manner: 3(X) = \&(dx) where : W + x — > E is such that 
^(A) = ^(tt' • A • 7r) for any permutation matrix n of size \X\ x \X\ and A e W + 1 x W + . 

5.2. The case of M inj . 

Example 5.1. A first example of a hA m] -invariant is any non-increasing function ( : N — > E 
of the cardinality \X\ of a finite metric space X: clearly, since Vioi M inj (X, Y ) # requires 
that \X\ ■< |y|, t/ien C(|X|) CO^D- Hence, the construction is functorial. 

Another simple example is 3 scp : _A/P nj — > E, which assigns to any finite metric space X its 
separation sep(X). This example belongs to a larger class. 

Example 5.2. For each k ^ 2 let 3 k : M mj — > E 6e i/ie functor given by 

3 k (X) := inf {e ^ 0| Mor^* (A fc (e), X) # 0} 

/or X g ob(Af nj ) with \X\ =s k. If \X\ < k we put 3 k (X) = +oo. 

Notice that 3 sep = 3^. 

That this definition is indeed .A/f^-functorial can be seen easily. 

Proof. Assume that X, Y e ob(A / l mj ) and <fi e MoT M ini(X,Y), then, we need to check that 
3 k ~(X) ^ 3 k ~(Y). If |X|, \Y\ < k, or \X\ ^ k and \Y\ < k, there's nothing to prove. 
Let's assume that \Y\ >■ k. Pick e > 3 k (X) and / e Mor^nj (Afc(e), X). Then, 
(fio f e Mor^inj (Afe(e), Y~), and hence e ^ 3 k (Y). The conclusion follows since e > 3~ k (X) 
was arbitrary. □ 

Another family of Ai mj -functorial invariants with a similar structure is the following. 
Example 5.3. For each k e N define 

3+(X) := sup {e ^ 0| Mor M ^(X, A k (ej) # 0} 
i/ |X| ^ and 3£(X) = otherwise. 

Finally, these two examples can be generalized in the following manner. 

Definition 5.2. Let Q a Ai be a collection of finite metric spaces and i : Ai lso — > E any 
M. %so -functorial invariant. Then, one defines 

3 n (X) :=inf {l(u)\3 cue tts.t. Mor^ (w, X) ^ 0} 

and 

3£(X) := sup {t(a;)| 3 ueQs.t. Mor^ (X, cu) ^ 0}. 

In these definitions we use the convention that inf over an empty set equals +oo whereas 
sup over an empty set equals 0. The proof that these these two constructions are Ai m3 - 
functorial follows the same line of argument as for the case of 3 k above. That these definitions 
generalize the ones of 3 k above can be seen by letting Q = {A k (5), 5 > 0} and i equal to 
the separation invariant. 

We do not know a complete classification of all the Ai inj -functorial invariants but it seems 
that progress could be made using our techniques. Notice that the family of A4 mj -functorial 
invariants is rich in the sense that it is closed under addition and multiplication. 
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5.3. The case of Ai 9en . This case is particularly easy to study: since the collection of all 
morphisms is quite large (it is a super-set of those of A4 lso and A4 mj ), one would expect 
that there are relatively few M. gen -f unctorial invariants. This is indeed the case and one has 
that the only .M gen -functorial invariants that are possible are the constant ones: as we saw 
in Remark 13.11 one property of the category A4 9en is that for any X,Y e ob(.M gen ) there 
exist morphisms e Mor^s™ (X, Y) and <p 2 e Morten (Y 5 X). Functoriality of 3 would 
then require that 3(X) >^ 3(Y) >^ 3(X), and hence, since X,Y were arbitrary, 3 must be 
constant. 

6. Standard Clustering 
Definition 6.1. For each 5 > we define the Vietoris-Rips clustering functor 

D\s ■ M 9en -+ C 

as follows. For a finite metric space (X,dx), we set 9ts(X, dx) to be (X, Px(5)), where 
Px{8) is the partition of X associated to the equivalence relation ~5 defined in Example \2.1\ 
We define how tRg acts on maps f : (X, dx) — > (Y,d Y ): ^Rs(f) is simply the set map f 
regarded as a morphism from (X,P X (S)) to (Y,P Y (S)) in C. 

We now check that %Ks is in fact A4 gew -functorial. Fix finite metric spaces (X, dx) and 
(y, dy) and a morphism e Morten (X, Y). It is enough to prove that whenever x, x' e X 
are s.t. x ~s then (f>(x) ~$ 4>(x') in Y as well. Let x = Xq,x\, . . . ,x n = x' be points 
in X with dx(xi,Xi + i) '< 5 for all i = 0, 1, . . . , n — 1. Then, since <p is distance non- 
increasing, dy (<t>(xi), 4>(x i+ i)) ^ 5 for all i = 0, 1, . . . , n — 1 as well, and by Definition 12. 1[ 
then (j)(x) ~5 (/)(x')Jj 

By restricting 9^ to the subcategories M ls ° and M mj , we obtain functors : M lso -> C 
and 9 c t™ J : M. m3 —> C. We will denote all these functors by 9^5 when there is no ambiguity. 

Remark 6.1 (The Vietoris-Rips functor is surjective.). Among the desirable condi- 
tions singled out by Kleinberg |Kle02j . one has that of surjectivity (which he referred to as 
"richness" ) . Given a finite set X and Px e V(X), surjectivity calls for the existence of a 
metric dx on X such that 9ts(X, dx) = (X, Px)- Indeed this is the case: pick any a > 1 and 
let dx(x, x') = 5 whenever x and x' , x # x' , are in the same block of Px, and dx(x, x') = ad 
when x and x' are in different blocks of Px ■ With this definition it is easy to verify that dx 
is an ultrametric on X and hence a metric. That d\$(X, dx) = (X, Px) follows directly from 
Lemma \2.1\ and the definition of the Vietoris-Rips functor. 

For «M being any one of our choices M. lso , M. m3 or ]V[ gen ', a clustering functor in this 
context will be denoted by €, : A4 —> C_. Excisiveness of a clustering functor refers to the 
property that once a finite metric space has been partitioned by the clustering procedure, it 
should not be further split by subsequent applications of the same algorithm. 

Definition 6.2 (Excisive clustering functors). We say that a clustering functor €. is 
excisive if for all (X, dx) e ob(Aj), if we write C(X, dx) = (X, {X a } ae A), then 

£ (X a , d X \ XaxX J = (X a , {X a }) for all a e A. 

3 The Vietoris-Rips functor is actually just single linkage clustering as it is well known, see [CM08, 
ICMlOaj . The name conies from certain simplicial constructions used by Vietoris in the 1930s to study Cech 
cohomology theories of metric spaces. Much later, Rips used similar constructions in the context of geometric 
group theory, see |Hau95j . 
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C 

Figure 2. Metric space used to prove that the functor *H : M mj -> C is not 
excisive. The metric is given by the graph distance on the graph. 

Remark 6.2 (The Vietoris-Rips functor is excisive). Fix 5 > and consider the 
functor 91$ defined in Example \7.1\ We claim that D\s is excisive. Indeed, write %\g{X, d) = 
(X,{X a , a e A}). Then, X a , a e A are the different blocks of the partition of X into 
equivalence classes of ~$ (x ~s x' if and only if x, x' e X a for some a e A). Fix a e A and 
let x,x' e X a . Let xo, X\, . . . , x m e X a be s.t. x = x, x m = x' and dx(xi,Xi+i) < 5 for 
i e {0, . . . , m — 1}. But then it follows that Xi e X a for all i e {0, . . . , m} as well and hence 
W.s(X a ,d X \ XaxXa ) = (X a ,{X a }). 

There exist large families of non-excisive clustering functors in M. m3 . 

Example 6.1 (A family of non-excisive functors in Ai m3 ). For each invariant J : 
.M mj — > E (recall their construction in ^) and non-increasing function rj : E — > E, con- 
sider the clustering functor tR : M™ 3 C defined as follows: For a finite metric space 
(X,dx), we define dx) to be (X,P X ), where Px is the partition of X associated to the 
equivalence relation ~ V (3(X)) on X . That 91 is a functor follows from the fact that whenever 
4> e Mor M mi(X,F) and x ~ v (p(X)) x' , then 4>(x) ~ V (3(Y)) 4>( x ')- 

Indeed, let x,x' e X be s.t. x ~ v (3(x)) and let x = xo,...,x n = x' in X s.t. 
d x (xi,x i+ i) < rj(3(X)) for i = 0, . . . , n - 1. Then, for i = 0, . . . , n - I, dY(4>(xi),(p(xi +1 )) ^ 
dx(xi,Xi + i) ^ rj(3(X)). Since r] is non-increasing and 3{X) >■ 3(Y), it follows that for 
i = 0, 1, . . . , n - 1, d Y ((f>(xi), <j>(x i+ i)) < r}(3(Y)), and hence <j)(x) ~ v p(Y)) 4>{x'). 

Now, the functors D\ are not excisive in general. An explicit example is the following, 
which applies to 3 = 3 sep and n(z) = z~ l . Consider the metric space (X,dx) depicted in 
figure on the right, where the metric is given by the graph metric on the underlying graph (the 
edge weights are shown in the figure). Note that sep(X,dx) = 1/2 and thus r/(3(X)) = 2. 

We then find that fk(X,d x ) = (X, {{A, B,C}, {D, E}}). But, sep (\[A, B, C}, (§°j)) = 1 

and hence rj(3({A, B, C})) = 1. Therefore, 

ft ({A, B, CI (Jl ?)) = ({A, B, C}, {A, {B, C}}), 
and we see that {A, B, C) gets further partitioned by 9\. 
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It is interesting to point out that the same construction of a non-excisive functor in Ai gen 
would not work, since as we saw in £j5j all the invariants 3 : Ai gen — > E are constant. We 
will see in §6. 51 that Ai 9en permits the existence of only one (well behaved) clustering functor 
which turns out to be the Vietoris-Rips functor, which is excisive as we saw in Remark 16.21 

6.1. The case of A4 tso . It is easy to describe all A4* so -functorial clustering schemes. Let X 
denote the collection of all isometry classes of finite metric spaces. Clearly, one must specify 
the clustering functor on each class of X in a manner which is invariant to self-symmetries of 
(an element of) the class. The precise statement is as follows. For each ( e X let (X^,dx ( ) 
denote an element of the class £, G^ the isometry group of (X^,dx ( ), and the set of all 
fixed points of the action of on V(X^). For completeness we state as a theorem the 
following obvious fact: 

Theorem 6.1 (Classification of .M iSO -functorial clustering schemes). Any M. lso - 
functorial clustering scheme determines a choice of p^ e Sf for each ( e X, and conversely, 
a choice of p^ for each £ e X determines an M lso -functorial scheme. 

6.2. Representable Clustering Functors. In what follows, M_ is either of Ai m ^ or Ai gen . 
For each 5 > the Vietoris-Rips functor : A4 —> C can be described in an alternative 
way which relies on the ability to construct maps from the two point metric space A 2 (5) into 
a given input finite metric space (X, dx)- A first trivial observation is that the condition 
that x,x' e X satisfy dx(x,x') ^ 5 is equivalent to requiring the existence of a map / e 
Mor^ ( A 2 (*5) , X) with {x,x'} c Im(/). Using this observation, we can reformulate the 
condition that x ~5 x' by the requirement that there exist Zq, Zi, . . . , z k e X with z$ = x, 
z k = x', and ft, f 2 , ■ ■ ■ , fk e Mor M (A 2 (5), X) with {x^Xi} c lm(fi) Vi = 1,2, ...,k. 
Informally, this points to the interpretation that {A 2 (5)} is the "parameter" in a "generative 
model" for D^. 

This immediately suggests considering more general clustering functors constructed in the 
following manner. Let Q be any fixed collection of finite metric spaces. Define a clustering 
functor £ n : M^C as follows: let (X,d) e ob(M) and write £ n (X,d) = (X,{X a } aeA ). 
One declares that points x and x' belong to the same block X a if and only if there exist 

• a sequence of points z , . . . , z^ e X with z$ = x and = x', 

• a sequence of metric spaces ui, . . . ,u k e Q and 

• for each i = 1, . . . , k, pairs of points e ui and morphisms fa e MorM(u>j, X) 
s.t. fi(aii) = Zi-i and = z { . 

Also, we declare that £ n (f) = f on morphisms /. Notice that above one can assume that 
z , zx, . . . , z k all belong to X a . 

Remark 6.3. Let Q be a collection of finite metric spaces. Consider the clustering functor 
(£ Q : M_^-C constructed as above. Then, clearly, £ (cjjdj) = (u,{u}) for all u e Q. As 
a consequence, we have the following: If X is any finite metric space and u e Q, then all 
f e Mor^yj (u>,X) are s.t. Im(/) is fully contained in a single block of the partition of X by 

Remark 6.4. Let £1 z> f2' ^ 0, then £ n> produces partitions that are not coarser than those 
produced by £. . 

Definition 6.3. We say that a clustering functor £ is representable whenever there exists 
a collection of finite metric spaces Q such that (£ = (£ n . In this case, we say that € is 
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represented by Q. We say that £ is finitely representable whenever £ = <Z n for some 
finite collection of finite metric spaces fl. 

As we saw above, the Vietoris-Rips functor 9Kg is (finitely) represented by {A 2 (<5)}. 

6.3. Representability and Excisiveness. We now prove that representability and exci- 
siveness are equivalent properties. Notice that excisiveness is an axiomatic statement whereas 
representability asserts existence of generative model for the clustering functor. 

Theorem 6.2. Let A4 be either of J\A m3 or Ai 9en . Then any clustering functor on M_ is 
excisive if and only if it is representable. 

Proof. Assume that £ is an excisive clustering functor. Let 

(6-6) VI := {(B, d\ BxB ), BeP, where (X, P) = €(X, d) and (X, d) e M). 

That is, Q contains all the possible blocks (endowed with restricted metrics) of all partitions 
obtained by applying £ to all finite metric spaces. Notice that by definition of Q, and since 
£ is excisive, 

(6-7) €(u, d w ) = (u, {u}) for all u e Q. 

We will now prove that € is represented by Q, that is € = <£p. Fix any finite metric space 
(X,d x ) and write (X, P) = t{X,d x ) and (X, P') = £ n (X,d x ). We know that Q contains 
all blocks BeP, and since the inclusion B «-»• X is a morphism in A4 mj , by Remark 16.31 
one sees that <£ n (B, d X \ ByB ) = (B, {B}). Now, by functoriality of €. , any two points x, x' in 
B must belong to the same block of P'. Since B was any block of P it follows that P is a 
refinement of P'. 

This is depicted by the diagram below: 

(6-8) ( B ,d xlBxB )^(X,d x ) 

(11. {B}) (X, P') 



Conversely, assume x, x' e B' e P' and let uj\, . . . , ujy. e ft, (a i; /%) e cOi for i = 1, 2, . . . , k, 
z , z 1} . . . , z k e X and f x , f 2 , . . . , f k s.t. fi e Mor M (wj, X), fifa) = and f0i) = z { for 
i = 1,2, ... ,k. Consider for each i e {1, 2, . . . , k} the diagram on the right, where we have 
used (16-71) . Then, since for each i e {1, . . . , k} ati, are in the same (unique) block of uji, 
zi_i and Zi have to be in the same block of P. 

(coi,d Wi ) —X (X,d x ) . 
(uUu^^^P) 

Since this has to happen for all i, we conclude that x = zq and z k = x' ought to be in the 
same block of P. Thus P 1 is a refinement of P. We have proved that P refines and is refined 
by P' and therefore these two partitions must be equal. Since X was any finite metric space, 
we conclude that (C is represented by Q and we are done. 

Assume now that £ is representable, i.e. £ = <E Q for some collection of finite metric 
spaces fl We will prove that £ is excisive. Fix a finite metric space (X, d x ) and write 
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€. n (X, dx) = (X, P). Pick any block B of P and any two points x, x' e B. Let cji, . . . , u) k e fi, 
fx e Mor M (wi, X), ...,f k e Mor^u^, X); (a 1} fix) eui,..., (a k , (3 k ) e u k ; and z ,Zx,...,z k 
be s.t. z = x, z k = x', fi(ai) = Zi-i, fi((3i) = Zi for all i = 1, 2, . . . , k. One can assume that 
all Zi e B and moreover, by Remark 16.31 Im(/i) c B. Hence, we can regard each fi as a 
morphism in Mor^a^, B) where B is endowed with the restricted metric. 

By Remark 16. 3[ £(uj, d^) = (u, {u}) for all u e fi. Hence, by functoriality we have that 
x, x' e B are in the same block of Pb, where (B, Pb) = €. n (B, dx\ BxB )- Since x,x' e B were 
arbitrary we conclude that Pg consists of only one block. The arbitrariness in the choices of 
B e P and X e ob(A^) finishes the proof that € = € n is excisive. □ 

6.4. A factorization theorem. We now prove that all finitely representable clustering 
functors in A4 9en and J\A m3 arise as the composition of the Vietoris-Rips functor with a 
functor that changes the metric. 

For a given collection fi of finite metric spaces let 

(6-9) % n : M -> M 

be the endofunctor that assigns to each finite metric space (X, dx) the metric space (X, d x ) 
with the same underlying set and metric given by d x = U{W x )\\ where W x :XxI^ M + 
is given by 

(6-10) (x, x') inf {A > 0| 3w £ fiand</> e Mor^A • X) with {x, x'} c Im(<^)}, 

for x # x', and by on diag(X x X). Above we assume that the inf over the empty set 
equals +oo. Note that Wx(x,x') < go for all x,x' e X as long as \oj\ ^ \X\ for some u e fi. 
Also, Wx(x, x') = oo for all x x' when \X\ < inf {|oj| , u e fi}. 

Theorem 6.3. Let M_ be either Ai 9en or ]\A m] and €. be any AA-functorial finitely repre- 
sentable clustering functor represented by some fi a M.. Then, (C = 9^1° T . 

Proof. Fix any finite metric space (X, dx) and write £(X, dx) = (X, P) and £Hi ° % n (X, dx) = 
(X, P'). Assume that x,x' e B e P and let x = z , zi, . . . , z k = x' e X, ui, . . . , u k e fi, 
fi e MorM( uj,X) and e u, t s.t. /;(«;) = z^ u = ^ for % = 1, 2, . . . , k. Then, 

from f)6-10p for all i = 1, 2, . . . , k, Wx(zi-x, Zi) ^ 1 and in particular maxj Wx(zi-i, z i) ^ 1- 
It follows from ( 12-1 j) that d x (x, x') 1 and consequently x and x' are in the same block of 
P' . Thus we see that P refines P' . 

Assume now that x,x' e B' e P'. By definition of the Vietoris-Rips functor, x ~j x' in 
(X, dx), which implies the existence of x = Xq, x±, . . . , x k = x' in X with d x (xi, x i+ i) ^ 1 for 
alH = 1, . . . , k — 1. Now, by (12-1 p . for each % = 0, 1, . . . , k there exist y§\ y{*\ . . . , y£ in X 
with t/q !) = Xj, ^ = x i+ i and J/j+i) ^ 1 for j = 0, 1, . . . , ki — 1. By concatenating 

all sequences {yj^}*^ for i = 0,1,..., k, we obtain a sequence zq,Zi,...,z^ e X with 

z = x, zn = x' and Hj(zj, z i+ i) ^ 1. By definition of Wx and because fi is finite, one sees 
that for every i = 1, 2, . . . , N there exists Aj e (0, 1], e fi and (pi e Mor^(Aj ■ (j0%, X) with 
{zi-i, z^ c lm(0j). But then, obviously, (pi e Mor^Ui, X) as well, and since <L is represented 
by fi one sees that x and x' must be in the same block of P. This proves that P' refines P, 
and finishes the proof. □ 

6.5. The case of M gen . 



4 Recall (HEU), the definition of U, 
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6.5.1. A Uniqueness theorem. In Ai 9en clustering functors are very restricted, as reflected 
by the following theorem. 

Theorem 6.4. Assume that £ : M. 9en — > C is a clustering functor for which there exists 
<5 C > with the property that 

• <£(A 2 (<5)) is in one piece for all 5 e [0, 8c\, and 

• (£(A 2 (<5)) is in two pieces for all 5 > 5<r. 

Then, € is the Vietoris-Rips functor with parameter 5<f. i.e. € = 9is e - 

Recall that the Vietoris-Rips functor is excisive, cf. Remark 16.21 
Remark 6.5 (About uniqueness in M. 9en ). For each 5 > consider the clustering functor 

o 

yis that assigns points x, x' in a metric space (X, dx) to the same block if and only if there 
exist xo, x\, . . . , Xk e X with xo = x, Xk = x' and dx{xi, Xj+i) < 5 for i = 0, 1, . . . , k — 1. 
Note the strict inequality, cf. with the definition of the Vietoris-Rips functor in Example 
~7l\ 

If one changes the conditions in Theorem \b\J\ in a way such that 

• <£(A 2 (5)) is in one piece for all 5 e [0, 5<r), and 

• C(A 2 (5)) is in two pieces for all 5 $ 5<t, 

then, € =Uis z - 

Proof of Theorem \6.4\ Fix a finite metric space (X, dx) and write (X,P) = C(X,dx) and 
(X,P') = X Sc (X,d x ). 

(1) Let x, x' e B for some block B e P' . We will prove that then contained in the 
same block of P. Let Xo,Xi, . . . , Xk e X be s.t. xq = x, Xk = x' and dx(xi, Xi+i) < 5c for 
all i e {1, . . . ,k — 1}. We now prove that each pair {x{, Xi+i) belongs to the same block of 
P', what will yield the claim. Indeed, since for each i, dx(%i,Xi+i) < St, then for each i 
there exists fa e MorM gen (^2(5e), X) with Im(/j) = {xi,Xi + \}. Thus, by functoriality and 
the definition of 5<t, Xi and X{ + i must fall in the same block of P'. This concludes the first 
half of the proof. 

(2) Assume that Xq,x' e X are in different blocks of P 1 . We will prove that then xq and 
x' must belong to different blocks of P. Let 5 := min (dx(x, x'), x e V\[xd]<5 € , x' e [x' Q ]s € }. 

Notice that by Lemma [27TI 5 > 5%. Write A 2 (S) = (^{p, q}, ( ° q ) J and define the map 

: X —> A 2 (<5) given by 4>(x) = p for all x e X\[xq]5 £ , and 4>(x') = q for all x' e [xg]^. We 
claim that cf) e Mor^ge«(X, A 2 (S)). Indeed, if x, x' e A A \[x ] < 5 (£ or x,x' e [^o],j e , 4>(x) = 4>(x') 
and there's nothing to check. Assume then that x e X\[x' ]s e and x' e [x' Q ]s e , and note that 
by definition of S, dx(x,x') >■ 5. On the other hand, p = <fi(x) and q = <p(x') are at distance 
S from each other. Thus 4> is distance non- increasing. Now, we apply € to X and A 2 (6) and 
note that in this case, by our assumption on £, since 5 > 5^, then C(A 2 (8)) is in two pieces. 
By functoriality we conclude that xq and x' cannot lie in the same block of P. □ 

6.6. Scale invariance in A4 gen and M? nj . It is interesting to consider the effect of im- 
posing Kleinberg's scale invariance axiom on .M gen -functorial and Ai mj -functorial clustering 
schemes. It turns out that in ]\A 9en there are only two possible clustering schemes enjoying 
scale invariance, which turn out to be the trivial ones: 

Theorem 6.5. Let <£ : M gen — > C be a clustering functor s.t. fj A = ^ for all A > 0. 

Then, either 
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• £ assigns to each finite metric space X the partition of X into singletons, or 

• £ assigns to each finite metric the partition with only one block. 

Proof. Let € : MP en —> C be a scale invariant clustering functor. Consider first the metric 
space A 2 (l) consisting of two points at distance 1. There are two possiblities: either (1) 
£(A 2 (1)) is in one piece, or (2) (£(A 2 (1)) in two pieces. 

Assume (1). By scale invariance, the partition of A 2 (<5) given by £(A 2 (5)) is in one piece 
for all 5 > 0. Fix any finite metric space (X,dx) and write £(X,dx) = (X,P). Pick 
any x,x' e X, let 5 = dx(x,x'), and consider the morphism <ft e Morten ( A 2 (5), X) s.t. 
<f)(p) = x and (f)(q) = x', where p and q are the two elements of A 2 (<5). By scale invariance 
and functoriality, x and x' must then be in the same block of P. Since x and x' were arbitrary, 
P must be in one piece as well. It follows from the arbitrariness of X that € then assigns 
the single block partition to any finite metric space. 

Now assume (2). By scale invariance (£(A 2 (5)) is in two pieces for all 5 > 0. Pick any 
finite metric space (X, dx) and write €(X, dx) = (X, P). Pick any x, x' e X, and a partition 
of X into subsets A and B such that x e A and x' e B. Let 5 = seppT), and consider the 
map ip : X —> A 2 (5) given by if)(a) = p for all a e A and ip(b) = q for all b e B. It is 
easy to verify that -0 e Morten (X, A 2 (5)). By scale invariance and functoriality, x and x' 
cannot be in the same block of P. Since x and x' were arbitrary, no two points belong to 
the same block of P, and hence CC assigns X the partition into singletons. It follows from 
the arbitrariness of X that € assigns to every finite metric space its partition into singletons. 
This finishes the proof. □ 

By refining the proof of the previous theorem, we find that the behavior of any Ai 1113 - 
functorial clustering functor is also severely restricted in each k e 

Theorem 6.6. Let <£ : A4 mj — > C be a clustering functor s.t. £o a x = £ for all A > 0. Let 

K(€) := {A; ^ 2; <£(A fc (l)) is in one piece). 

• If K{ft) = 0, then €, assigns to each finite metric space X its partition into singletons. 

• Otherwise, let k& = mini^(£). Then, 

— For each k ^ kg, <£ assigns to each finite metric space X e M. k the partition of 
X with only one block. 

— For each 2 ^ k < kc, £ assigns to each finite metric space X e A4k the partition 
of X into k singletons. 

Proof. Assume first that K(€) = 0, pick any finite metric space X and let k = \X\. Pick 
any two distinct points x, x' e X. Choose any injective map <p '■ X —> Aj,(sep(X)) such 
that <p(x) and <p(x') fall in different blocks of the partition of A k (sep(Xj) produced by £ 
Then, functoriality guarantees that x and x' must belong to different blocks of the partition 
of X generated by C Since x,x' e X were arbitrary we conclude that € partitions X into 
singletons. 

Now assume that K{€) ^ 0. Pick k ^ kc and X e Aik- Consider any morphism e 
Moroni (Afc(diam(X)), X) arising from arbitrarily choosing an injection from the underlying 
set of Afc(l) to X. Then, by functoriality, this forces <£(X,dx) to produce a partition of X 
into exactly one block. 



'Recall that Mk denotes the collection of all fc-point metric spaces. 
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Finally, for 2 ^ k < kg, £ partitions A&(1) into more than piece. An argument similar to 
the one given for the case K(£) = proves that then £ partitions any X e M. k into (k) 
singletons. 

□ 

6.7. Clustering functors on Ai' tnj and density. In the case of A4 mj , there is a richer class 
of allowed clustering schemes that includes functors other than the Vietoris-Rips one. We 
already saw in Example 16.11 that in A4 mj there are non-excisive clustering schemes that by 
construction are different from the Vietoris-Rips functor. A class of excisive M. mj -functorial 
clustering schemes which has practical interest is the following. Consider for each neN and 



§ ..a 

.♦* ■ 
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Figure 3. Fix 5 > and consider the two clustering functors (t A2 ^ and (£ A3 ( 5 ) 
on jM m; ', Then £ A2 ( a ) applied to both metric spaces in the Figure will yield a 
single cluster. Meanwhile, (t A3 ^ will yield a single cluster for the metric space 
on right, but it will not connect the points of the metric space on the left. 
This admits the interpretation that in order for £ As ( 5 ) to produce a cluster, 
a certain degree of agglomeration of the points is necessary. This reflects the 
fact that this clustering scheme is more sensitive to density than its two-point 
counterpart (£ A ' 2 ( 5 ^ = 9t g . Similar considerations apply to the general case of 
£ Afc W, any k e N 

5 e R + the clustering functor £ An ( s \ and notice that this clustering method implements the 
notion that in order to deem a two points x and x' in a finite metric space X to be in the same 
cluster one requires that they are "tightly packed together" in the sense that there exists a 
sequence of points x , Xi, . . . , x k connecting x to x' such that each consecutive pair {xj, x i+ i} 
is contained in a subset Si of X of cardinality n whose inter-point distances do not exceed 5. 
This notion is closely related to proposals in network clustering |PDFV05] and the DBSCAN 
algorithm |EpKSX96], and for n > 2 provides a way to tackle the so called "chaining effect" 
exhibited by single linkage clustering (i.e. (J^ 2 ^), see Figure |3j Note that Theorem 16.31 
implies that for a given finite metric space (X, dx), € Ai ^(X, dx) can be obtained by first 
applying a change of metric to X and then applying the standard Vietoris-Rips functor to 
the resulting metric space. The change of metric specified by the theorem leads to finding 
all 3-cliques in X whose diameter is not greater than 5. 

Corollary 6.1. On M inj , for allmeN 
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Figure 4. A metric space that encodes a notion of "density" different from 
that provided by A3 (S). . 

Above, we have used the shorthand notation T m = % n , for Q = {A m (5), 5 ^ 0}. 

The formalism of representable clustering functors allows for far more generality than that 
provided by families {A n (S), n e N, 5 > 0}. For example, one could encode the requirement 
that clusters be densely connected by considering a clustering functor represented by the 
metric space in Figure Ht, which consists of the vertices of an equilateral triangle of side S, 
together with its center. 

7. Hierarchical Clustering 

Example 7.1 (A hierarchical version of the Vietoris-Rips functor). We define a 
functor 

D\ : M 9en V 

as follows. For a finite metric space (X,dx), we define (X,dx) to be the persistent set 
(X,9 X R ), where x R {r) is the partition associated to the equivalence relation ~ r defined in 
Example \2.1\ This is clearly an object in V_. We also define how $H acts on maps f : 
(X,dx) — » (Y, dy): The value ofyi(f) is simply the set map f regarded as a morphism from 
(X, X R ) to (Y, 9 Y R ) in V_. That it is a morphism in V_ is easy to check. 

Clearly, this functor implements the hierarchical version of single linkage clustering in the 
sense that for each 5 ^ 0, if one writes Vls(X, dx) = (X, Px(Sj), then Px(8) = #x R (^)- 

7.1. Functoriality over M tso . This is the smallest category we will deal with. The mor- 
phisms in Af so are simply the bijective maps between datasets which preserve the distance 
function. As such, functoriality of a clustering algorithms over J\A lso simply means that the 
output of the scheme doesn't depend on any artifacts in the dataset, such as the way the 
points are named or the way in which they are ordered. 

Agglomerative hierarchical clustering, in standard form, as described for example in [JD88J, 
begins with point cloud data and constructs a binary tree (or dendrogram) which describes 
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the merging of clusters as a threshold is increased. The lack of functoriality comes from 
the fact that when a single threshold value corresponds to more than one data point, one 
is forced to choose an ordering in order to decide which points to "agglomerate" first. This 
can easily be modified by relaxing the requirement that the tree be binary. This is what we 
did in Example 12.21 In this case, one can view these methods as functorial on A4 lso , where 
the functor takes its values in arbitrary rooted trees. It is understood that in this case, the 
notion of morphism for the output (V_) is simply isomorphism of rooted trees. In contrast, 
we see next that amongst these methods, when we impose that they be functorial over the 
larger (more demanding) category M. gen then only one of them passes the testH 

7.2. Complete and Average linkage are not functorial over M. mj . It is interesting to 
point out why complete linkage and average linkage (agglomerative) clustering, as defined in 
Example 12.21 fail to be functorial on A^ mj (and therefore also on Ai gen as well) . A simple 
example explains this: consider the metric spaces X = {A, B,C} with metric given by the 
edge lengths {4, 3, 5} and Y = (A', B', C) with metric given by the edge lengths {4, 3, 2}, 
as given in Figure |5j Obviously the map / from X to Y with f(A) = A', f(B) = B' and 
/(C) = C is a morphism in Af mj . Note that for example for r = 3.5 (shaded regions of 
the dendrograms in Figure E]) we have that the partition of X is II x = {{A, C}, B} whereas 
the partition of Y is U Y = {{A',B'},C} and thus f(IL Y ) = {{A,B},{C}}. Therefore 
Ux does not refine as required by functoriality. The same construction yields a 

counter-example for average linkage. 




The result in Theorem 17. II is actually more powerful in that it states that there is a unique functor from 
M 9en to V that satisfies certain natural conditions. 
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7.3. Functoriality over A4 gen : a uniqueness theorem. We prove a theorem of the same 
flavor as the main theorem of |Kle02j . except that we prove existence and uniqueness on 
A4 9en instead of impossibility in our context. 

Theorem 7.1. Let fj : A4 9en V_ be a hierarchical clustering functor which satisfies the 
following conditions. 

(I) : Let a : M 9en -> Sets and f3 : V -> Sets be the forgetful functors (X, d x ) -> X 
and (X, dx) — * X , which forget the metric and persistent set respectively, and only 
"remember" the underlying sets X. Then we assume that (3 ° S) = a. This means 
that the underlying set of the persistent set associated to a metric space is just the 
underlying set of the metric space. 

(II) : For 5 ^ let A%(8) = ({p,q}, (°o)) denote the two point metric space with 
underlying set {p,q}, and where dist(p,q) = 5. Then $)(A2(5)) is the persistent set 
({Pj q}i $a 2 (<5)) whose underlying set is {p,q} and where 0A 2 (S)(t) is the partition with 
one element blocks when t < 5 and is the partition with a single two point block when 

(III) : Write $)(X,dx) = (X,9®), then for any t < sep(X), the partition s * (t) is the 
discrete partition with one element blocks. 

Then is equal to the functor Dl. 

We should point out that another characterization of single linkage has been obtained in 
the book [JS7T] . 

Proof of Theorem \ 7. 1\ Write Sj(X,dx) = (X,9 X ). For each r ^ we will prove that (a) 



^x R ( r ) * s a refinement of 9 x (r) and (b) 9 x (r) is a refinement of 9 x R (r). Then it will follow 
that 9 x R (r) = x {r) for all r ^ 0, which shows that the objects are the same. Since this is 
a situation where, given any pair of objects, there is at most one morphism between them, 
this also determines the effect of the functor on morphisms. 

Fix r ^ 0. In order to obtain (a) we need to prove that whenever x, x' e X lie in the same 
block of the partition 9 x R (r), that then they both lie in the same block of x (r). 

skip It is enough to prove the following 

Claim 7.1. Whenever dx(x,x') ^ r then x and x' lie in the same block of 9 x (r). 

Indeed, if the claim is true, and x ~ r x' then one can find x ,Xi, . . . , x n with x = x, 
x n = x' and dx(xi, ^ r for i = 0, 1, 2, . . . , n — 1. Then, invoking the claim for all pairs 
(xj, x i+ i), i = 0, . . . , n — 1 one would find that: x = x and x\ lie in the same block of x (r), 
X\ and X2 lie in the same block of 6 x (r), . . ., x n _i and x n = x' lie in the same block of x (r). 
Hence, x and x' lie in the same block of x (r). 

So, let's prove the claim. Assume dx(x, x') r, then the function given by p — > x, q — > x' 
is a morphism g : A 2 (r) — > (X, dx) in Ai gen . This means that we obtain a morphism 

Sj(g):Sj(A 2 (r))^Sj(X,d x ) 

in V. But, by assumption (II), p and q lie in the same block of the partition 0A 2 (r)( r )- By 
functoriality it follows that $j(g) is persistence preserving and hence the elements g(p) = x 
and g(q) = x' lie in the same block of x {r). This concludes the proof of (a). 

skip For condition (b), assume that x and x' belong to the same block of the partition 
x { r ) f° r some r ^ 0. We will prove that necessarily x ~ r x' . This of course will imply that 
x and x' belong to the same block of 9 x R (r). 
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skip Consider the metric space (X r ,d r ) whose points are the equivalence classes of X 
under the equivalence relation ~ r , and where the distance d r (B, £>') between two equivalence 
classes B and B' is defined to be the maximal ultra-metric pointwisely less than or equal to 

W x (£>, B') := min min dx (x, x') , 

xeB x'eB' 

that is d r = U(Wx), see (12-ip . It follows from the definition of ~ r and Lemma [27TI that 
if two equivalence classes are distinct, then the distance between them is > r. Indeed, for 
otherwise there would be x,x' e X with [x] r # [x'] r and G? r ([x] r , [x'] r ) ^ r, which would 
imply that W_x(|X|r, [x'] r ) ^ f, which is a contradiction. Hence sep(A r , d r ) > t. 

Write Sj(X r ,d r ) = (X r ,9 x ). Since sep(X r ) > r, hypothesis (III) now directly shows 
that the blocks of the partition 9 Xr (r) are exactly the equivalence classes of X under the 
equivalence relation ~ r , that is Q Xr ( r ) — @x R ( r )- Finally, consider the morphism 

7r r : (X, d x ) -> (X r ,d r ) 

in A4 9en given on elements x e X by n r (x) = [x] r , where [x] r denotes the equivalence class 
of x under ~ r . By functoriality, io(7r r ) : (X, 9 X ) —> (X r ,9 x ) is persistence preserving, and 
therefore, 0®(r) is a refinement of 0x r ( r ) = #x R ( r )- This is depicted as follows: 

(X r , d r ) — — >- (X r , 9 Xr ) 

This concludes the proof of (b). □ 

7.3.1. Comments on Kleinberg's conditions. We conclude this section by observing that ana- 
logues of the three (axiomatic) properties considered by Kleinberg in |Kle02] hold for Dl. 

• Kleinberg's first condition was scale-invariance, which asserted that if the distances 
in the underlying point cloud data were multiplied by a constant positive multiple 
A, then the resulting clustering decomposition should be identical. In our case, this 
is replaced by the condition (Recall notation of Example 14. 3[) that 9^° cr\(X, dx) = 
s\° 9i(X, d x ), which is trivially satisfied. 

• Kleinberg's second condition, richness, asserts that any partition of a dataset can 
be obtained as the result of the given clustering scheme for some metric on the 
dataset. In our context, partitions are replaced by persistent sets. Assume that there 
exist t e R + s.t. 0x(t) is the single block partition, i.e., impose that the persistent 
set is a dendrogram (cf. Definition 12. 2p . In this case, it is easy to check that any 
such persistent set can be obtained as TZ gen evaluated for some (pseudo) metric on 
some dataset. Indeed^ let (X,9 X ) e ob(V). Let e±, ... ,6k be the (finitely many) 
transition/discontinuity points of 9 X - For x,x ! e X define d x (x,x') = minjej} s.t. 
x,x' belong to same block of 9 x (ei). 

This is a pseudo metric on X. Indeed, pick points x, x' and x" in X. Let e% 
and 62 be minimal s.t. x,x' belong to the same block of 9 x (ei) and x',x" belong to 
the same block of 9x(^)- Let eu := max(ei,e2). Since (X, 9 X ) is a persistent set 
(Definition 12. 2p . ^(^12) must have a block B s.t. x,x' and x" all lie in B. Hence 
d x (x,x") -< ei2 s$ ex + e 2 = d x (x,x') + d x (x',x"). 



We only prove triangle inequality. 
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• Finally, Kleinberg's third condition, consistency, could be viewed as a rudimentary 
example of functoriality. His morphisms are similar to the ones in A4 gen . 

7.4. Functoriality over Ai mj . In this section, we illustrate how relaxing the functoriality 
permits more clustering algorithms. In other words, we will restrict ourselves to Ai m3 which 
is smaller (less stringent) than A4 gen but larger (more stringent) than M. IS °. 

Fir a finite metric space (X, dx ) and r ^ 0. For x e X, let [x] r be the equivalence class 
of x under the equivalence relation ~ r , and define c(x) = \[x] r \. For any positive integer 
m, we now define X m c X by X m = {x e X; c(x) ^ m). We note that for any morphism 
/ : X —> Y in AV' nj , we find that f(X m ) c: Y m . This property clearly does not hold for 
more general morphisms. For every r, we can now define a new equivalence relation ~™ on 
X, which refines ~ n by requiring that each equivalence class of ~ r which has cardinality 
^ m is an equivalence class of and that for any x for which c(x) < m, x defines a 
singleton equivalence class in ~™. We now obtain a new persistent set (X, Ox), where #x( r ) 
will denote the partition associated to the equivalence relation ~™. It is readily checked that 
X -» (X, &%) is functorial on M inj . 

Definition 7.1. For each m e N let $) m : Af" 1 - 7 V be the functor which to each finite 
metric space (X,dx) assigns the persistent set (X,9x) where for each r ^ 0, #x( r ) ^ s 
partition of X into equivalence classes of~™. 

This scheme could be motivated by the intuition that one does not regard clusters of small 
cardinality as significant, and therefore makes points lying in small clusters into singletons, 
where one can then remove them as representing "outliers" or "noise" . 

Another construction arises from similar consideration to those that gave rise to the stan- 
dard clustering algorithms (t Am ( (S ) in Section WJ\ cf. Corollary 16.11 

Definition 7.2. For each meN define the functor D\ Am : Ai mj ' — > V_ given by 

D\ Am = 9\o T m , 

where we have written T m = % n for Vt = {A m (S), 5 0}, cf. $6-91) . 

That this definition produces a functor follows directly from the fact that 91 and % m are 
functorial. 

The resulting construction is in the same spirit as DBSCAN |EpKSX96|. 

Remark 7.1. Notice first that £H A2 = 9t. Also, for each m e N, the hierarchical clustering 
functor 9^ Am is related to the standard clustering functor £ Am (' 5 ) in the following manner. 
Fix 5^0 and write € Am ^{X, d x ) = (X, P x ), then, P x = 0%(6). 

An example of dendrograms obtained with these methods is shown in Figure [61 

8. Discussion 

The methods of this paper permit natural extensions to other situations, such as clustering 
of graphs and networks with, for example, the notion of clique clustering |PDFV05] fitting 
naturally into our context. It appears likely that from the point of view described here, 
it will in many cases be possible, given a collection of constraints on a clustering functor, 
to determine the universal one satisfying the constraints. One could therefore use sets of 
constraints as the definition of clustering functors. 




Figure 6. Dendrograms resulting from applying 9^ Am to a randomly gen- 
erated dataset consisting of 50 points in the plane. The dataset consists of 
two clusters of points which are joined by a thin chain of points with some 
additional outliers. On the top row, from left to right we show the dendro- 
grams corresponding to m = 2,3 and 4. Note how increasing m produces 
dendrograms that exhibit more clearly clustered subsets. 

In addition to producing classifications of clustering functors, functoriality is a very de- 
sirable property when clustering is used as an ingredient in computational topology. Func- 
torial clustering schemes produce diagrams of sets of varying shapes which can be used to 
construct simplicial complexes which in turn reflect the topology of connected components 
|Car09j . Functorial schemes also can be used to construct "zig-zag diagrams" |CdS10] which 
reflect the stability of clusters produced over a family of independent samples. 

We believe that the conceptual framework presented here can be a useful tool in reasoning 
about clustering algorithms. We have also shown that clustering methods which have some 
degree of functoriality admit the possibility of certain kind of qualitative geometric analysis 
of datasets which can be quite valuable. The general idea that the morphisms between math- 
ematical objects (together with the notion of functoriality) are critical in many situations 
is well-established in many areas of mathematics, and we would argue that it is valuable in 
this statistical situation as well. 
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